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Abstract — How does one find important or influential 
people in an online social network? Researchers have pro- 
posed a variety of centrality measures to identify individuals 
that are, for example, often visited by a random walk, infected 
in an epidemic, or receive many messages from friends. 
Recent research suggests that a social media users' capacity to 
respond to an incoming message is constrained by their finite 
attention, which they divide over all incoming information, 
i.e., information sent by users they follow. We propose a 
new measure of centrality — limited-attention version of 
Bonacich's Alpha-centrality — that models the effect of 
limited attention on epidemic diffusion. The new measure 
describes a process in which nodes broadcast messages to 
their out-neighbors, but the neighbors' ability to receive the 
message depends on the number of in-neighbors they have. 
We evaluate the proposed measure on real-world online social 
networks and show that it can better reproduce an empirical 
influence ranking of users than other popular centrality 
measures. 

I. Introduction 

An individual's position within a social network is thought 
to confer advantages, allowing him to exploit the structure 
of social ties to accumulate power, prestige or influ- 
ence p, (m. El, E3, Q, 0. Many measures of 
centrality were proposed to capture the importance of 
the position in a network. Some of these, like degree 
and betweenness centrality |11|, measure an individual's 
ability to control the flow of information in the network. 
Other measures give higher centrality to those positions 
that are themselves connected to central positions (231, 
El, 1291 , [5|. The growing popularity of online social 
media has sparked new interest in centrality. Researchers 
have proposed using centrality to identify influential social 
media users |9|, |2| whose endorsement can, for example, 
maximize the reach of a "viral" marketing campaign 1241 . 
or conversely, who can most quickly stop a malicious 
rumor from spreading. 

Most of the existing centrality measures examine link 
structure of the network to identify key nodes within it. 
Take, for example, the Web, which is represented as a 
directed graph of hyperlinked Web pages. An important 
page within this graph is one that is visited often by 
Web surfers. This observation forms the basis of Google's 



original Web page ranking algorithm PageRank f29l. By 
modeling Web surfing as a random walk, PageRank assigns 
a centrality score to each page based on its value in the 
equilibrium distribution of the random walk. However, 
a central individual in a social network through which 
disease is spreading is one who infects, either directly or 
indirectly, most others. Unlike Web surfing, the spread of a 
virus is modeled as an epidemic process. Thus, PageRank, 
which is intimately connected with random walks, will 
not identify key individuals in a social network. Instead, a 
measure such as the Katz score |23 1 or Bonacich's Alpha 
centrality |4|, which gives the equilibrium distribution of 
an epidemic process on a network |14|, is more appropri- 
ate. 

Now consider information spreading through an online 
social network, for instance, by users sending messages 
or product recommendations to their friends. While in- 
formation spread in networks is often modeled as an 
epidemic process (e.g., 1 19], | 28 1), recent research suggests 
that psychological and cognitive factors are important in 
determining whether a person will see and act on friends' 
recommendations. Specifically, attention was shown to be a 
critical aspect of onHne behavior (13, (33l, (32l, (20l. At- 
tention is the psychological mechanism that controls how 
we process incoming stimuli and decide what activities to 
engage in (22l . (301 . Actions, such as reading a tweet, 
browsing a Web page, or responding to email, require 
mental effort, and since human brain's capacity for mental 
effort is limited, so is attention. Moreover, online users 
must divide their attention over all incoming stimuli |20|. 
As a consequence, the more stimuli people have to process, 
the smaller the probability they will respond to any one 
stimulus. While attention need not be distributed uniformly 
over friends — some friends may receive a greater share 
of a person's attention due to familiarity, trust, social 
closeness, or influence fT6l . (2Tl — for simplicity, we 
assume that each friend receives the same fraction of 
a person's attention. We call this phenomenon limited 
attention (la). 

Limited, divided attention changes the nature of inter- 
actions between nodes in a network and therefore, how 
central nodes are identified. Now a node's capacity to 



infect others depends not only on how many connections it 
has but also on who and how many others these nodes are 
connected to. In Section |IIl| we introduce a new centrality 
measure — limited-attention Alpha-Centrality (I a AC) — 
that models attention-limited nature of social interactions 
and provide its mathematical definition. For completeness, 
we also introduce and define limited- attention PageRank 
(laPR), which models the effect of limited attention 



on a random walk process. In Section IV we evaluate 
the proposed algorithms and centrality measures on real- 
world data, including follower graphs from social media 
sites Digg and Twitter. In the Appendix, we present fast 
approximate algorithms that allow us to calculate these 
measures even on large graphs and provide their perfor- 
mance guarantees. 

II. Dynamics, Attention and Centrality 

Centrality measures examine topology of a network to 
identify important or central nodes within it. It has been 
recognized recently, however, that centrality is the product 
of a network's links and the dynamical processes taking 
place on it, which determine how ideas, pathogens, or 
influence flow along social links (61, 1261 , ifHl . |[T4ll . 
Take, for example, one definition of centrality used by 
the popular PageRank algorithm |29|: a network node is 
important if it is often visited by a random walk. A random 
walk is a stochastic process that starts at some node, and at 
each time step transitions to a randomly selected neighbor 
of the current node. Variants of the random walk are 
used to model flows in physical systems, e.g., chemical 
and heat diffusion, and can be used to model social 
phenomena resulting from one-to-one interactions, such as 
Web surfing, money exchange and phone conversations. 





Fig. 1. Different dynamical processes taking place on a network: (a) 
random walk, (b) epidemic spread, and their limited- attention variants: (c) 
limited-attention random walk and (d) limited-attention epidemic spread. 
In limited-attention process, a node's capacity to receive a message 
depends on its in-degree. 

In a social network, a message or a virus propagates 
by being broadcast by an infected individual to all her (out- 
) neighbors. Such processes are modeled as an epidemic 



(or a contact) process. The difference between it and the 
random walk is illustrated in Figure [T] which shows the 
neighborhood of node a. Directed edges in this network 
represent, for example, hyperlinks between Web pages, or 
who can call whom in a social network, or in the context of 
social media, they can also indicate that b, c and d follow 
a and receive broadcasts from her. Figure [TJa) illustrates 
a one-to-one interaction, e.g., phone call, while Fig. [TJb) 
shows a one-to-many broadcast. 

Until now, we have assumed that nodes have an 
unlimited capacity to receive incoming signals, whether 
Web surfers, phone calls, or messages from friends. This 
may not always be the case. Suppose a Web server can 
receive a limited number of connections, in extreme case 
only one. Then the probability that a Web surfer starting 
at a will reach b depends on whether the Web server in 
charge of b is able to receive an incoming request. In a 
social network, cognitive and perceptual factors can limit 
a person's capacity to process incoming messages |[20l. 
Such factors collectively figure into the phenomenon we 
refer to as limited attention. This means that the probability 
a user will respond to a message from a friend decreases 
with the number of friends she follows. This is illustrated 
graphically in Fig. [TJc) and (d). Node b is more likely 
to receive a message from a than node c because c is 
receiving messages from eight nodes, while b from only 
one node. 

Different dynamic processes lead to different notions 
of centrality. PageRank is used to find nodes that are 
often visited by a random walk (with random restarts), 
while Alpha- (or Bonacich) Centrality identifies nodes 
that are often infected during an epidemic |[T4l . Below 
we define limited-attention PageRank and limited-attention 
Alpha-Centrality, centrality measures that take into ac- 
count the finite attention of online social users. Limited- 
attention PageRank identifies nodes that are often visited 
by a random walk, when each node's capacity to receive 
the walker depends on its in-degree. Similarly, limited- 
attention Alpha-Centrality identifies nodes that are often 
infected in an epidemic, when each node's susceptibility 
to infection also depends on its in-degree. 

III. Limited-Attention Centrality 

We represent a network as a directed graph with V 
nodes and E edges. The adjacency matrix of the graph 
is defined as: = 1 if there is an edge from u to 

v\ otherwise, = 0. Also, ^i] = 0. The set of 

out-neighbors of u is {v G V\{u^v) G E}\ and the set of 
in-neighbors is {v G V\{v.,u) G E}. Two other important 
quantities are the in-degree and out-degree matrices. The 
out-degree matrix Dout is a diagonal matrix defined as 
Dout[i.i] = T.jA[iJ] = Ae^ and Dout[iJ] = V 
i j. Here, e is a |]/| -dimensional row vector of ones, 
and is its transpose. The in-degree matrix Din is a 
diagonal matrix defined as Din[i^i] = ^^Zi^ihj] = 
and Din[i,j] =0\/ j. 



A. Limited-attention PageRank: A PageRank vector 
pr(a, s) is the steady state probability distribution of a 
random walk with restarts with a damping factor a. This 
means that with a probability a, the walk transitions to 
one of the out-neighbors of a current node, and with 
probability (1 — a) it transitions to any node in the network. 
The starting vector s, gives the probability distribution for 
where the walk transitions after restarting, which is usually 
taken as a uniform vector s = e/\V\. The transfer matrix 
D~^^A encodes the transition probabilities of a random 
walk on the network. PageRank vector pr(a, s) is the 
unique solution of the following iterative equation: 

pr(a, s) = (I - a)s + apr(a, s)D~^^A (3.1) 

Now, if a node's capacity to receive a random walker 
is limited, the transfer matrix must be modified. As stated 
above, we consider the simplest scenario in which the 
finite capacity is divided uniformly between all incom- 
ing connections. This case is modeled by the transfer 
matrix D~^^AD~^ . Therefore, limited- attention PageRank 
^^pr(a, 5) is the solution of the following iterative equa- 
tion: 

''^pr(a, s) = {l- a)s + a'»pr(a, s)Z)-\AZ)-i (3.2) 

The starting vector above is s = eD~^ . Note that while the 
PageRank transfer matrix D~^^A is stochastic, since each 
row or column sums to one, this is no longer the case for 
the limited- attention PageRank transfer matrix. 

We illustrate the differences between PageRank and 
limited-attention PageRank on a toy directed network. 
Figure [2ja) shows this network with the size of the node 
proportional to its centrality score relative to other nodes, 
as determined by PageRank (with a = 0.85). Node B 
is the most central, since it has many in-links, enabling 
a random walker to reach it via many different paths. 
Peripheral nodes H, /, J, etc., are less important, since 
they only receive the random walker via a random jump. 
On the other hand, limited-attention PageRank, shown in 
Fig. [2jb), scores these nodes highly. The node ranked 
highest by PageRank, B, on the other hand, dramatically 
decreases in centrality. This node divides its attention 
among many in-links, limiting its ability to receive a 
random walker along any specific link. The peripheral 
nodes, on the other hand, have few in-links, and are better 
able to receive the random walker, whether it is following 
an out-link or executing a random jump. Their importance, 
therefore, is greater in this scenario. 

B. Limited-attention Alpha-Centrality: Alpha- 
Centrality measures the total number of paths from a 
node, exponentially attenuated by their length. Bonacich 
introduced this measure |4| as a generalization of the 
index of status proposed by Katz |23|, and it is sometimes 
referred to as Bonacich centrality. Alpha-Centrality matrix 
gives the number of attenuated paths between two nodes, 
and it is usually written as a power series expansion of 
the adjacency matrix, with attenuation parameter a > 0: 



C = A + aA^ + + + This series converges 

to C = aA{I — aA)~^ while a < l/X^ax, where Xmax 
is the largest eigenvalue of A (i.e., spectral radius of the 
network). Parameter a determines how far, on average, 
a node's effect will be felt and sets the length scale of 
interactions. When a is small, Alpha-Centrality probes 
only the local structure of the network. As a grows, 
more distant nodes contribute to the centrality score of 
a given node HTSl . As a ^ '^/Xmax, the length scale of 
interactions diverges and it becomes a global measure. 

Alpha-Centrality gives the steady state distribution 
of an epidemic process on a network |[T4l . where a is 
the probability to transmit a message or influence along 
a link. Therefore, (^,j)th entry of the Alpha-Centrality 
matrix C can be interpreted as the likelihood that the 
virus will reach node j from node i. Summing over all 
columns j gives the Alpha-Centrality score of node i, 
ac(a) = Ce^ = J2j ^{hj)^ the number of infections 
directly or indirectly caused by node i. Summing over the 
rows of the Alpha-Centrality matrix, on the other hand, 
gives ac(a)^ = e ■ C = ^iC{i^j), the total number of 
times that node i is infected by others. 

Alpha-Centrality vector ac(a, 5) can also be defined 
iteratively as: 

ac(a, 5) = 5 + • ac(a, s), (3.3) 

where the starting vector s = Ae^ is taken as out-degree 
centrality fT]. 

Let us now consider the case in which a node's 
capacity to receive incoming stimuli — whether messages 
or viruses — is limited and uniformly divided among all 
incoming connections. Therefore, the probability that node 
j will receive a message broadcast by i will be proportional 
to l/din{j), where din{j) is the in-degree of node j. The 
limited-attention Alpha-Centrality matrix can be written in 
terms of the modified adjacency matrix M = AD^^ as: 

Cia = M^ aM^ + a^M^ + a^M^ + . . . 

The limited-attention Alpha-Centrality vector ^^ac(a, s) 
can also be written in iterative form: 

^''ac(a, s) = s + OiAD~^ -^"^ ac(a, 5), (3.4) 

with the starting vector s = AD~le^ . Note that the 
transfer matrix AD^^ is a stochastic matrix. 

Figures [2jc) and (d) illustrate the differences be- 
tween Alpha-Centrality and its limited- attention variant. 
Figure [2jc) shows the directed network with nodes sizes 
proportional to their ac scores. The Alpha-Centrality scores 
in this example were calculated for a = 0.85. The rankings 
of nodes are similar to those produced by PageRank 
(Fig. [2ja)), though node for example, is relatively 
less important. In the limited- attention variant, shown in 
Fig. [2|d), the picture looks completely different. While B 
in (d) loses its importance, due to may in-links, node A 
becomes more central, since it receives incoming signals 
over a single in-link. Peripheral nodes are not judged to be 
central, because, unlike random jumps in PageRank, they 
never receive any signals. 



(a) PR 



(b) laPR 



(c) AC 



(d) laAC 



Fig. 2. Directed network with sizes of nodes weighed by their score according to (a) PageRank and (b) attention-limited PageRank (c) Alpha-centrality 
and (d) limited- attention Alpha-centrality of the influence graph. 



IV. Applications to Social Media 

We use centrality measures proposed in this paper to 
identify influential people on social media. Correctly iden- 
tifying such people can have far-reaching consequences 
for identifying noteworthy content, targeted information 
diffusion, and other applications. While calculating Eq. 3A_ 
was infeasible for such large networks, we used approx- 
imate algorithms presented in the Appendix for these 
calculation. Appendix also gives performance guarantees 
of the approximate algorithms. 

Researchers have proposed a number of simple heuris- 
tics to identify influential social media users that rely, for 
example, on the number of followers or mentions 19J, L27J . 
121 . Others have used centrality by analyzing the follower 
graph to find users with high PageRank scores ifTOl , 
ISTTl . However, since information spread on networks is 
traditionally described as an epidemic |[T9l , 1281 , Alpha- 
Centrality may do a better job 1 12 |, since it explicitly mod- 
els epidemic dynamics. We show, however, that limited- 
attention Alpha-Centrality, the measure that accounts for 
both the epidemic nature of social media broadcasts and 
the divided attention of its users, does a better job identi- 
fying influential users than Alpha-Centrality. 

Specifically, we study URL- sharing activity on Digg 
and Twitter, two popular social media sites for content 
sharing. Both sites allow users to follow other users by 
listing them as friends. The follower relation is asymmet- 
ric. When user A follows (becomes as fan of) B, she 
receives B's broadcasts, but not vice versa: we denote the 
relationship sls B ^ A. Representing the follower graph in 
matrix form, a user's out-degree measures the number of 
followers she has, and her in-degree the number of friends 
she follows. 

A. Data Collection: The Digg dataset contains more 
than 3 million votes on some 3500 stories promoted to 
Digg's front page in June 2009. More than 139K distinct 
users voted for at least one story in the data set (submission 
counts as the story's first vote). We call these users active 
users. Next, we extracted the friendship links created 
by active users and constructed a follower graph that 
contained active users who were following the activities 
of others. Only about 7 IK active users listed others as 
friends, resulting in network with around 280K users and 
over 1.7 million links. 

The Twitter data set was collected over a period 



of three weeks in October 2010 using the Gardenhose 
streaming API. We focused on tweets that included a URL 
in the body of the message. In order to ensure that we 
had the complete tweeting history of the URL, we used 
the search API to retrieve all tweets containing that URL. 
Users who tweeted the URL are considered active. Data 
collection process resulted in more than 3 million posts 
tweeted by 816K users which mentioned 70K distinct 
URLs. Next, we used the REST API to collect followers 
of each active user, keeping only those followers who 
themselves were active, i.e., tweeted at least one URL 
during data collection period. The resulting follower graph 
had almost 700K nodes and over 36 million edges. More 
details of the data collection method are provided in 1T41 . 

B. Results: We calculate Alpha-Centrality (AC) and 
limited-attention Alpha-Centrality (I a AC) on the Digg 
and Twitter follower graphs using algorithm for I a AC 
(Alg. [2]) presented in the Appendix and the algorithm for 
AC presented in |15|. These are approximate algorithms 
with proven performance guarantees. We calculate limited- 
attention PageRank (laPR) on the transpose of the fol- 
lower graph using Alg. [T] since node's influence is related 
to the number of walks it generates, rather than receives. 
The in- and out-degrees were conditioned by adding a 
small number (0.01) to avoid division to zero. 

In order to compare the performance of centrality 
measures, we need a relevant measure of influence. When 
a user posts a URL on Digg or Twitter, she broadcasts 
it to all her followers. We refer to this user as the 
submitter. Whether or not her follower will re-broadcast 
the URL (i.e., retweet it on Twitter or vote for it on 
Digg) depends on its quality and submitter's influence. 
Assuming that URL's quality is uncorrelated with the 
submitter, we can average out its effect by aggregating 
over all URLs submitted by the same user |12|. The 
residual difference between submitters can be attributed to 
variations in influence. Similar to |9|, 1 14], |2|, we use the 
average number of times the URLs submitted by the user 
are re-broadcast by her followers as the empirical measure 
of influence. 

Figure [3] shows how well the rankings produced by 
different centralities correlate with the empirical influence 
rankings of users who submitted at least two URLs which 
were rebroadcast at least ten times. We use Spearman 
rank correlation because it is less sensitive to variations in 
scores, and we expect some variation to arise in approxi- 
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Fig. 3. Correlation of rankings of (a) Digg and (b) Twitter users found 
by different measures of centrality with the empirical influence ranking. 



mate centrality scores. Limited- attention Alpha-Centrality 
correlates better with the empirical measure of influence 
than Alpha-Centrality over a broad range of a values, 
consistent with our claim that I a AC is a better measure 
for predicting central social media users, because it better 
models the dynamics of online communication than AC. 
On Digg, AC appears to outperform laAC for small 
values of a. Since a can be thought of as the scale of 
interaction, this implies that locally, AC better predicts 
influential users. This could be the consequence of the fact 
that our measure of influence, i.e., number of re-broadcasts 
by followers, is a local measure. In the future, we plan 
to compare the performance of centrality measures using 
a global measure of influence, for example, the average 
size of cascades triggered by submitted URLs. We did 
not expect limited-attention PageRank (laPR) to predict 
influence rankings of Digg and Twitter users, since the 
dynamic process this centrality models does not at all 
describe communication patterns of social media users, and 
we found no correlation. 

Interestingly, PageRank and laAC have similar perfor- 
mance, since laAC calculated on the adjacency matrix A 
of the follower graph is almost identical to PR calculated 
on the transpose of A, except that the starting vectors 
are different in the two algorithms. This suggests that 
dynamics of random walk are almost equivalent to epi- 
demic dynamics under the conditions of uniformly divided 
attention, when direction of the flow is reversed. This 
observation could explain why PR can give good results 
in the social media domain. We leave implications of this 
observation for future research. 



V. Conclusion 

Information flow in social networks, including online net- 
works, is often modeled as an epidemic process, suggesting 
that centrality measures based on epidemics are appropri- 
ate for predicting influential social media users. We pro- 
pose a new centrality measure that takes into account the 
finite capacity of social media users to process incoming 
messages from friends. We modeled such limited attention 
by scaling the probability a node receives a message by 
the inverse of its in-degree. We presented approximate 
algorithm that allows us to efficiently calculate proposed 
measure for the real-world social networks on Digg and 
Twitter. We showed empirically that centrality measure 
that models limited- attention epidemics does a better job 
predicting highly retweeted social media users than one 
that models simple epidemics. Our findings suggest that 
the nature of interactions among network nodes should 
determine how central nodes are identified. 

Acknowledgements: This material is based upon work 
supported by the Air Force Office of Scientific Re- 
search under contracts FA9550- 10- 1-0569 and FA9550- 
10-1-0102, by the Air Force Research Laboratories un- 
der contract FA8750- 12-2-01 86, by DARPA under con- 
tract W911NF-12-1-0034, and by the National Science 
Foundation under grant CIF- 1217605. PJ's internship was 
sponsored by the USC Viterbi-India Summer program. 

Appendix: Approximate Algorithms 



Finding limited- attention PageRank (Eq. 3.2) and Alpha- 
Centrality (Eq. |3.4| ) requires the computation of matrix 
inverse, which can be done in 0(|yp) operations using the 
naive implementation of the algorithm (\V\ is the number 
of nodes in the network). This is prohibitively expensive 
for networks with thousands or more nodes. However, 
solving equations iteratively requires 0(|Vp) operations 
in each iteration, though we do not know how many 
iterations are sufficient for an optimal solution. We pro- 
pose Approximate Limited-Attention Page Rank and Ap- 
proximate Limited- Attention Alpha Centrality algorithms, 
which can be used to calculate a near optimal solution. 
The algorithms use a single error tolerance parameter S 
(0 < (5 < 1) to control both the quality of the solution and 
computation time. 

The proposed algorithms and their performance guar- 
antee are based on the approximate PageRank 1 1 1 and ap- 
proximate Alpha-Centrality 1 15] algorithms. They provide 
a flexible way to compute the near optimal centrality vector 
c~r using a starting vector s and a residual vector r. Initially 
r = s and c~r = 0. The algorithms iteratively move the 
weight from r to c~r vector, until the values in the residual 
vector r are sufficiently small. The amount of error in the 
approximate centrality vector is equivalent to the amount 
remaining in the residual vector. The performance guaran- 
tee of the proposed algorithms are given in Theorem |0.1| 
and Theorem |0.2| , which are based on Lemma |0.1| The 
Lemma states that each iteration maintains an invariant 



vector cr = ct{s) — cr(r) = cr(s — r). This means that 
the amount of error in the approximate centraHty vector is 
equivalent to the error remaining in the residual vector. 

Proposition 0.1. For any fixed value of a in [0, 1] and 
starting vector s, cr(a, 5) is linear in s. 

Proof: The limited- attention PageRank vector 
^^pr(a, 5) is a unique solution to 

cr(5) = ^'^pr(a, s) = {I - a)s ^ a - ^'^pr(a, s)M 

where M=D~^^AD~^ . The limited-attention Alpha- 
Centrality vector ^^ac(a, s) can also be written in iterative 
form: 



residual vector ^ r' is 



cr(5) = ac(a, s) = s ^ a • ac(a, s)M, 

where M=AD~^ . The centrality vectors can be proved 
linear with respect to s by substituting suitable values for 
cr(5) and M in the proof presented in flSl. ■ 

Lemma 0.1. At the start of each iteration of while loop 
cr = cr(s) — cr(r) = cr(5 — r) such as the sum of elements 
in r decreases with each iteration. 



Pr oof: The proof of correctness is based on Propo- 



sition 



0.1 



During initialization, r = s and c~r = 0; 
therefore, cr(s — r) = cr(0) = = cr. The lemma is 
maintained throughout the execution of the loop. To prove 
this, we use a row vector Zu such as Zu{i) = 1 if i = u\ 
otherwise, Zu{i) = 0. Before the next iteration of while 
loop in Algorithm [T] we have c~r = c~r + (1 — a)zir{i) 
and r' = r — Zir{i) + ar{i)Mz' where c~r', r' are updated 
centrality vector and residual vectors and i is the vertex 
dequeued in line number 11 of the algorithm. Now con- 
sider 

cr(r) = cr(r — Zivii)) + ci{zir{i)) 

— _ Zir{i)) + (1 — a)zir{i) + CT{aZir{i)M) 

— _ Zir{i) + aZir{i)M) + (1 — a)zir{i) 
= cr(r') + cr' — cr = cr(r') + cr' — cr(5 — r) 

It follows that cr' = cr(r) — cr(r') + cr(5 — r) = 
cr(r — r' + (s — r)) = cr(s — r'). On termination of the 
loop, given the lemma and an error tolerance parameter 
the approximate centrality vector should always satisfy 

ct{s)[{\ > ct[{\ > (1 - S)ct{s)[{\ yieV 

We choose a uniform starting vector s, s[i] = ||s||i/|V|, 
yi e V. The algorithm terminates when r[i] < ed\ 
\/i G V, so we choose e = |^|^^^L = 



Jmax . 
^out 



With this 

choice of e we also ensure freedom in choice of the value 
of a with in the range of to 1 . This freedom is achieved 
at the cost of increased running time of the algorithm. In 

the end r[i] < 5s[i], therefore, =^ cr(r)[i] < 5Qx{s)[i]. 
Thus, 

cr[i] > {1 - S)cT{s)\i]. 

It is obvious that cr(5)[z] > c~r[z]; hence cr(s)[z] > c~r[z] > 
(1 — S)cT{s)[i] Mi G V. Also the sum of all elements of 



r\i\ + 



dout{i) 



E 



din{j) 



Since value of a lies in [0,1] and Y.jeNo^t{i) d~U) - 
dout{^)^ net sum of all values of residual vector decreases 
with each iteration of while loop. Similarly the we can 
prove that the lemma is valid for Algorithm [2] 



A. Approximate Limited-Attention PageRa nk: Lim- 
ited attention Page Rank (laPR) given by Eq. 3^ can be 
written as the solution cr(a, 5) of: 

cr(a, s)[i] 



cr(a, 



dout{i)din{j)' 



Here N'^^{j) is a set of in-neighbors of j, i.e., nodes i 
such that edge (ij) G E. Also, N^^^{j) is the set of out- 
neighbors of j, i.e., nodes i such that (j, i) G E. We take 
the starting vector s = e/\V\ to be uniform. To simplify 
notation, we will refer to cr(a, 5) as cr. 



Algorithm 1 Approximate 
PageRank(l/, E, s, a, 5) 



limited-attention 



1 

2 
3 
4 
5 
6 
7 
8 
9 

10: 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20: 
21 
22 



^out ' 



e = S\\s\U/\VK 
r = s; 

Queue q = new Queue(); 
for each i e V do 

cT\i] = 0; 



if 



> e then 



^max 

q.aad(i); 
end if 
end for 

while q.sizeO > do 
i = q.dequeueO; 

cr[z] = cr[z] + (1 — a)r[z]; 
T = ar[{\/dout{i)', 
r\i] 0; 

for each j G 7V^^^(i) do 
r[j]=r[j]+TM,(j); 
if !q.contains(j) and r[j]/d^^f > e then 

q.add(j); 
end if 
end for 
end while 
return cr; 



Theorem 0.1. Given an < a < 1 and a uniform 
starting vector s, the approximate centrality vector cr is 
obtained from the algorithm in run time ^(^^jlrgjf^) • 

Proof: Given an a in [0,1]. Algorithm [T] works by 
dividing ar[i] equally amongst all N^'^^{i) out-neighbors 
of node i. Each out-neighbor j receives a fraction of the 
weight, based on its capacity, din{j), to receive incoming 



messages. Hence, all r[j] will increase by some fraction. 
Let r be old residual vector and be the updated residual 
vector. The sum of all elements of residual vector ^ r' is 



r t 



r — r\i\ + a 



y — 



The sum of the entries of residual vector decreases by 



E 



r — r\i\ + 



ar\i\ 



y — 



T\l\ 



T\l\ — OL 



y — 



/ T\l\ 

> r\i] - a • dout{i) ) > (1 - o^)ed 



imax 
out 



Let k be the total number of iterations, net amount removed 
from residual vector will be at least 



k{l-a)edZT <Mi 



k < 



(1 — a)ed\ 



'max 
out 



Since each iteration is proportional to doutH], the worst 



case time complexity is 0{ For our choice of e, 

5{l-a) 



this is equivalent to O(^^^^H-). 



Algorithm 2 Approximate Limited-Attention Alpha- 

Centrality(F, E, s, a, S) 
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10: 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20: 
21 
22 



r = s; 

Queue q = new Queue(); 
for each i e V do 
0; 



cr[zj 
if 



> e then 



^max 

q.add(z); 
end if 
end for 

while q.sizeO > do 
i = q.dequeueO; 



T = a- 



crm - 

r\i] 



din{i) ' 

r{u) = 0; 

for each j e N^^{i) do 

if !q.contains(z) and > e then 

q.add(j); 
end if 
end for 
end while 
return cr; 



Theorem 0.2. Given < a < 1 and starting vector s, 
the approximate centrality vector cr is obtained from the 
algorithm in run time ^(^^^r^)- 

Proof: Given an a in [0,1]. Let r be old residual 
vector and r' be the updated residual vector. The sum of 
all elements of residual vector ^ r' is 

Y.r' = Y.r-r[i]^(adUj)'^) 

= E^-^[j] + («^[j]) 

The sum of the entries of residual vector decreases by 

- (^r-r[j] + (ar[j])) 
= r[j\ - {ar[j\) 



(1 — a)r[j] > (1 — a)ed: 



'max 
in 



B. Approximate Limited- Attention Alpha- Centrality: 

Limited attention Alpha-Centrality {laAC), given by 
Eq. 3.4 can be rewritten as the solution cr(a, s) of: 



i\ = s\i\ + a 



CT{a,s)[j] 



Let k be the total number of iterations, net amount removed 
from residual vector will be at least 



k{l — a)ed^, 



< 



k< 



(1- 



ll^lli 



Since each iteration is proportional to din, so the worst 
case time complexity is O(tyz^). For our choice of e 

this is equivalent to ■ 



with the starting vector s[i] = J2jeNo^t(i) l/^in(j). As 
before, we use N^'^^{i) to denote the set of out-neighbors, 
and N'^^{i) the in-neighbors, of node i. 



C. Performance of Approximate Algorithms: For rel- 
atively small networks (up to thousands of nodes), we 
compared centrality scores calculated by the approximate 
algorithms to those calculated by their exact versions. 
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Fig. 4. Performance of the fast approximate limited-attention PageRank (laPR) and Alpha-Centrality (laAC) on Gnutella, US Air and Power grid 
networks. Performance is measured by time (number of iterations of the approximate algorithm) and rms error of the centrality values calculated by 
the approximate and exact algorithms. 



The U SAir networl0is an undirected network of 332 
nodes and 4,252 edges, which represent airports linked by 
direct flights. The Powergrid networl|^is an undirected 
network of 4,941 nodes and 6,594 edges representing 
the topology of the US Western States power grid. The 
Gnutella datasej^ contains a snapshot of the Gnutella peer 
to peer network with 6,301 nodes and 20,777 edges. 

Figure |4] shows the performance of the fast approx- 
imate algorithms proposed in this paper on the three 
networks vs the error tolerance 6. Performance is measured 
in terms of time (number of iterations) taken to compute 
approximate centrality values and rms error of these 
compared to the values computed by the exact algorithms 



^ http://vlado.fmf .uni-lj .si/pub/networks/data/ 
^ http :// cdg . Columbia, edu/cdg/ datasets 

31. 



^ http : //snap . Stanford . edu/ data/ 



Eqs. |3.2| and |3.4| In all cases, while it takes longer to 
compute centrality scores for decreasing values of S, the 
answers are closer to their exact values. 

Figure [5] plots the number of iterations taken by 
the proposed algorithms to calculate centralities for the 
Digg and Twitter data sets for different values of the 
error tolerance parameter 6. Parameter values used in the 
calculations were a = 9.0x 10~^ for both laAC and laPR 
on Digg, and a = 1 x 10~^ for laAC and a = 0.9 for 
laPR on Twitter. As expected, the number of iterations 
increases for smaller error tolerances. 
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